NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Statistical Inference for Lindley Random Walks with Correlated Increments

Grant, John; Kong, Jiajie; Liu, Xin; Lund, Robert; Wood, Jonathan (February 2024, Journal of statistical computation and simulation)
Richard Krutchkoff (Ed.)
Full Text Available
Genomics of cold adaptations in the Antarctic notothenioid fish radiation

https://doi.org/10.1038/s41467-023-38567-6

Bista, Iliana; Wood, Jonathan M.; Desvignes, Thomas; McCarthy, Shane A.; Matschiner, Michael; Ning, Zemin; Tracey, Alan; Torrance, James; Sims, Ying; Chow, William; et al (December 2023, Nature Communications)

Abstract Numerous novel adaptations characterise the radiation of notothenioids, the dominant fish group in the freezing seas of the Southern Ocean. To improve understanding of the evolution of this iconic fish group, here we generate and analyse new genome assemblies for 24 species covering all major subgroups of the radiation, including five long-read assemblies. We present a new estimate for the onset of the radiation at 10.7 million years ago, based on a time-calibrated phylogeny derived from genome-wide sequence data. We identify a two-fold variation in genome size, driven by expansion of multiple transposable element families, and use the long-read data to reconstruct two evolutionarily important, highly repetitive gene family loci. First, we present the most complete reconstruction to date of the antifreeze glycoprotein gene family, whose emergence enabled survival in sub-zero temperatures, showing the expansion of the antifreeze gene locus from the ancestral to the derived state. Second, we trace the loss of haemoglobin genes in icefishes, the only vertebrates lacking functional haemoglobins, through complete reconstruction of the two haemoglobin gene clusters across notothenioid families. Both the haemoglobin and antifreeze genomic loci are characterised by multiple transposon expansions that may have driven the evolutionary history of these genes.
more » « less
Full Text Available
Chromosome level genome assembly of the Etruscan shrew Suncus etruscus

https://doi.org/10.1038/s41597-024-03011-x

Bukhman, Yury V.; Meyer, Susanne; Chu, Li-Fang; Abueg, Linelle; Antosiewicz-Bourget, Jessica; Balacco, Jennifer; Brecht, Michael; Dinatale, Erica; Fedrigo, Olivier; Formenti, Giulio; et al (February 2024, Scientific Data)

Abstract Suncus etruscusis one of the world’s smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew’s small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control.
more » « less
Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies

https://doi.org/10.1038/s41592-022-01440-3

Mc Cartney, Ann M.; Shafin, Kishwar; Alonge, Michael; Bzikadze, Andrey V.; Formenti, Giulio; Fungtammasan, Arkarachai; Howe, Kerstin; Jain, Chirag; Koren, Sergey; Logsdon, Glennis A.; et al (June 2022, Nature Methods)

Full Text Available
Divergent sensory and immune gene evolution in sea turtles with contrasting demographic and life histories

https://doi.org/10.1073/pnas.2201076120

Bentley, Blair P.; Carrasco-Valenzuela, Tomás; Ramos, Elisa K.; Pawar, Harvinder; Souza Arantes, Larissa; Alexander, Alana; Banerjee, Shreya M.; Masterson, Patrick; Kuhlwilm, Martin; Pippel, Martin; et al (February 2023, Proceedings of the National Academy of Sciences)

Sea turtles represent an ancient lineage of marine vertebrates that evolved from terrestrial ancestors over 100 Mya. The genomic basis of the unique physiological and ecological traits enabling these species to thrive in diverse marine habitats remains largely unknown. Additionally, many populations have drastically declined due to anthropogenic activities over the past two centuries, and their recovery is a high global conservation priority. We generated and analyzed high-quality reference genomes for the leatherback ( Dermochelys coriacea ) and green ( Chelonia mydas ) turtles, representing the two extant sea turtle families. These genomes are highly syntenic and homologous, but localized regions of noncollinearity were associated with higher copy numbers of immune, zinc-finger, and olfactory receptor (OR) genes in green turtles, with ORs related to waterborne odorants greatly expanded in green turtles. Our findings suggest that divergent evolution of these key gene families may underlie immunological and sensory adaptations assisting navigation, occupancy of neritic versus pelagic environments, and diet specialization. Reduced collinearity was especially prevalent in microchromosomes, with greater gene content, heterozygosity, and genetic distances between species, supporting their critical role in vertebrate evolutionary adaptation. Finally, diversity and demographic histories starkly contrasted between species, indicating that leatherback turtles have had a low yet stable effective population size, exhibit extremely low diversity compared with other reptiles, and harbor a higher genetic load compared with green turtles, reinforcing concern over their persistence under future climate scenarios. These genomes provide invaluable resources for advancing our understanding of evolution and conservation best practices in an imperiled vertebrate lineage.
more » « less
Full Text Available
Complete vertebrate mitogenomes reveal widespread repeats and gene duplications

https://doi.org/10.1186/s13059-021-02336-9

Formenti, Giulio; Rhie, Arang; Balacco, Jennifer; Haase, Bettina; Mountcastle, Jacquelyn; Fedrigo, Olivier; Brown, Samara; Capodiferro, Marco Rosario; Al-Ajli, Farooq O.; Ambrosini, Roberto; et al (December 2021, Genome Biology)

Abstract Background Modern sequencing technologies should make the assembly of the relatively small mitochondrial genomes an easy undertaking. However, few tools exist that address mitochondrial assembly directly. Results As part of the Vertebrate Genomes Project (VGP) we develop mitoVGP, a fully automated pipeline for similarity-based identification of mitochondrial reads and de novo assembly of mitochondrial genomes that incorporates both long (> 10 kbp, PacBio or Nanopore) and short (100–300 bp, Illumina) reads. Our pipeline leads to successful complete mitogenome assemblies of 100 vertebrate species of the VGP. We observe that tissue type and library size selection have considerable impact on mitogenome sequencing and assembly. Comparing our assemblies to purportedly complete reference mitogenomes based on short-read sequencing, we identify errors, missing sequences, and incomplete genes in those references, particularly in repetitive regions. Our assemblies also identify novel gene region duplications. The presence of repeats and duplications in over half of the species herein assembled indicates that their occurrence is a principle of mitochondrial structure rather than an exception, shedding new light on mitochondrial genome evolution and organization. Conclusions Our results indicate that even in the “simple” case of vertebrate mitogenomes the completeness of many currently available reference sequences can be further improved, and caution should be exercised before claiming the complete assembly of a mitogenome, particularly from short reads alone.
more » « less
Full Text Available
Semi-automated assembly of high-quality diploid human reference genomes

https://doi.org/10.1038/s41586-022-05325-5

Jarvis, Erich D.; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R.; Porubsky, David; et al (November 2022, Nature)

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
more » « less
Full Text Available
Recombination between heterologous human acrocentric chromosomes

https://doi.org/10.1038/s41586-023-05976-y

Guarracino, Andrea; Buonaiuto, Silvia; de Lima, Leonardo Gomes; Potapova, Tamara; Rhie, Arang; Koren, Sergey; Rubinstein, Boris; Fischer, Christian; Abel, Haley J.; Antonacci-Fulton, Lucinda L.; et al (May 2023, Nature)

Abstract The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications 1,2 . Although the resolution of these regions in the first complete assembly of a human genome—the Telomere-to-Telomere Consortium’s CHM13 assembly (T2T-CHM13)—provided a model of their homology 3 , it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium 4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph 5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination 6,7 . The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations 8 , and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago 9 .
more » « less
Full Text Available
A draft human pangenome reference

https://doi.org/10.1038/s41586-023-05896-x

Liao, Wen-Wei; Asri, Mobin; Ebler, Jana; Doerr, Daniel; Haukness, Marina; Hickey, Glenn; Lu, Shuangjia; Lucas, Julian K.; Monlong, Jean; Abel, Haley J.; et al (May 2023, Nature)

Abstract Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
more » « less
Full Text Available
The complete sequence of a human genome

https://doi.org/10.1126/science.abj6987

Nurk, Sergey; Koren, Sergey; Rhie, Arang; Rautiainen, Mikko; Bzikadze, Andrey V.; Mikheenko, Alla; Vollger, Mitchell R.; Altemose, Nicolas; Uralsky, Lev; Gershman, Ariel; et al (April 2022, Science)

Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion–base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
more » « less
Full Text Available

« Prev Next »

Search for: All records